Goto

Collaborating Authors

 Hưng Yên Province


VLM6D: VLM based 6Dof Pose Estimation based on RGB-D Images

Sarowar, Md Selim, Kim, Sungho

arXiv.org Artificial Intelligence

The primary challenge in computer vision is precisely calculating the pose of 6D objects, however many current approaches are still fragile and have trouble generalizing from synthetic data to real-world situations with fluctuating lighting, textureless objects, and significant occlusions. To address these limitations, VLM6D, a novel dual-stream architecture that leverages the distinct strengths of visual and geometric data from RGB-D input for robust and precise pose estimation. Our framework uniquely integrates two specialized encoders: a powerful, self-supervised Vision Transformer (DINOv2) processes the RGB modality, harnessing its rich, pre-trained understanding of visual grammar to achieve remarkable resilience against texture and lighting variations. Concurrently, a PointNet++ encoder processes the 3D point cloud derived from depth data, enabling robust geometric reasoning that excels even with the sparse, fragmented data typical of severe occlusion. These complementary feature streams are effectively fused to inform a multi task prediction head. We demonstrate through comprehensive experiments that VLM6D obtained new SOTA performance on the challenging Occluded-LineMOD, validating its superior robustness and accuracy.


From Federated Learning to Quantum Federated Learning for Space-Air-Ground Integrated Networks

Quy, Vu Khanh, Quy, Nguyen Minh, Hoai, Tran Thi, Shaon, Shaba, Uddin, Md Raihan, Nguyen, Tien, Nguyen, Dinh C., Kaushik, Aryan, Chatzimisios, Periklis

arXiv.org Artificial Intelligence

6G wireless networks are expected to provide seamless and data-based connections that cover space-air-ground and underwater networks. As a core partition of future 6G networks, Space-Air-Ground Integrated Networks (SAGIN) have been envisioned to provide countless real-time intelligent applications. To realize this, promoting AI techniques into SAGIN is an inevitable trend. Due to the distributed and heterogeneous architecture of SAGIN, federated learning (FL) and then quantum FL are emerging AI model training techniques for enabling future privacy-enhanced and computation-efficient SAGINs. In this work, we explore the vision of using FL/QFL in SAGINs. We present a few representative applications enabled by the integration of FL and QFL in SAGINs. A case study of QFL over UAV networks is also given, showing the merit of quantum-enabled training approach over the conventional FL benchmark. Research challenges along with standardization for QFL adoption in future SAGINs are also highlighted.


A Systematic Survey on Large Language Models for Algorithm Design

Liu, Fei, Yao, Yiming, Guo, Ping, Yang, Zhiyuan, Zhao, Zhe, Lin, Xi, Tong, Xialiang, Yuan, Mingxuan, Lu, Zhichao, Wang, Zhenkun, Zhang, Qingfu

arXiv.org Artificial Intelligence

Algorithm Design (AD) is crucial for effective problem-solving across various domains. The advent of Large Language Models (LLMs) has notably enhanced the automation and innovation within this field, offering new perspectives and promising solutions. Over the past three years, the integration of LLMs into AD (LLM4AD) has seen substantial progress, with applications spanning optimization, machine learning, mathematical reasoning, and scientific discovery. Given the rapid advancements and expanding scope of this field, a systematic review is both timely and necessary. This paper provides a systematic review of LLM4AD. First, we offer an overview and summary of existing studies. Then, we introduce a taxonomy and review the literature across four dimensions: the roles of LLMs, search methods, prompt methods, and application domains with a discussion of potential and achievements of LLMs in AD. Finally, we identify current challenges and highlight several promising directions for future research.


Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges

Van Dinh, Nguyen, Dang, Thanh Chi, Nguyen, Luan Thanh, Van Nguyen, Kiet

arXiv.org Artificial Intelligence

Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.


Enabling Trustworthy Federated Learning in Industrial IoT: Bridging the Gap Between Interpretability and Robustness

Jagatheesaperumal, Senthil Kumar, Rahouti, Mohamed, Alfatemi, Ali, Ghani, Nasir, Quy, Vu Khanh, Chehri, Abdellah

arXiv.org Artificial Intelligence

Federated Learning (FL) represents a paradigm shift in machine learning, allowing collaborative model training while keeping data localized. This approach is particularly pertinent in the Industrial Internet of Things (IIoT) context, where data privacy, security, and efficient utilization of distributed resources are paramount. The essence of FL in IIoT lies in its ability to learn from diverse, distributed data sources without requiring central data storage, thus enhancing privacy and reducing communication overheads. However, despite its potential, several challenges impede the widespread adoption of FL in IIoT, notably in ensuring interpretability and robustness. This article focuses on enabling trustworthy FL in IIoT by bridging the gap between interpretability and robustness, which is crucial for enhancing trust, improving decision-making, and ensuring compliance with regulations. Moreover, the design strategies summarized in this article ensure that FL systems in IIoT are transparent and reliable, vital in industrial settings where decisions have significant safety and economic impacts. The case studies in the IIoT environment driven by trustworthy FL models are provided, wherein the practical insights of trustworthy communications between IIoT systems and their end users are highlighted.


Automatic Prompt Selection for Large Language Models

Do, Viet-Tung, Hoang, Van-Khanh, Nguyen, Duy-Hung, Sabahi, Shahab, Yang, Jeff, Hotta, Hajime, Nguyen, Minh-Tien, Le, Hung

arXiv.org Artificial Intelligence

Large Language Models (LLMs) can perform various natural language processing tasks with suitable instruction prompts. However, designing effective prompts manually is challenging and time-consuming. Existing methods for automatic prompt optimization either lack flexibility or efficiency. In this paper, we propose an effective approach to automatically select the optimal prompt for a given input from a finite set of synthetic candidate prompts. Our approach consists of three steps: (1) clustering the training data and generating candidate prompts for each cluster using an LLM-based prompt generator; (2) synthesizing a dataset of input-prompt-output tuples for training a prompt evaluator to rank the prompts based on their relevance to the input; (3) using the prompt evaluator to select the best prompt for a new input at test time. Our approach balances prompt generality-specificity and eliminates the need for resource-intensive training and inference. It demonstrates competitive performance on zero-shot question-answering datasets: GSM8K, MultiArith, and AQuA.


A Comparative Study of Real-Time Implementable Cooperative Aerial Manipulation Systems

Barakou, Stamatina C., Tzafestas, Costas S., Valavanis, Kimon P.

arXiv.org Artificial Intelligence

Research and development in Unmanned Aerial Vehicles (UAVs) or Unmanned Aircraft Systems (UAS) has witnessed unprecedented scientific and commercial interest and growth, particularly during the last two decades. Although military applications dominated the global market for years, interest in using UAVs in civil and public domains increases exponentially, worldwide, albeit challenges related to integrating unmanned aviation into the national airspace. Sample applications include, but are not limited to, surveillance [1], search and rescue [2], aerial photography [3], fire monitoring [4], agriculture [5], and aerial delivery [6]. The listed applications refer to solely passive tasks, that is, tasks in which no UAV interaction with the environment is needed. However, contact with the environment is required in industrial and maintenance applications like bridge inspection, water damn inspection, high-voltage transmission line inspection [7], assembly tasks [8] or construction [9].


VLSP 2023 -- LTER: A Summary of the Challenge on Legal Textual Entailment Recognition

Tran, Vu, Nguyen, Ha-Thanh, Vo, Trung, Luu, Son T., Dang, Hoang-Anh, Le, Ngoc-Cam, Le, Thi-Thuy, Nguyen, Minh-Tien, Nguyen, Truong-Son, Nguyen, Le-Minh

arXiv.org Artificial Intelligence

In this new era of rapid AI development, especially in language processing, the demand for AI in the legal domain is increasingly critical. In the context where research in other languages such as English, Japanese, and Chinese has been well-established, we introduce the first fundamental research for the Vietnamese language in the legal domain: legal textual entailment recognition through the Vietnamese Language and Speech Processing workshop. In analyzing participants' results, we discuss certain linguistic aspects critical in the legal domain that pose challenges that need to be addressed.


When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust

Nguyen, Minh-Tien, Nguyen, Duy-Hung, Sabahi, Shahab, Le, Hung, Yang, Jeff, Hotta, Hajime

arXiv.org Artificial Intelligence

Large language models (LLMs) have significantly advanced the field of natural language processing, with GPT models at the forefront. While their remarkable performance spans a range of tasks, adapting LLMs for real-world business scenarios still poses challenges warranting further investigation. This paper presents an empirical analysis aimed at bridging the gap in adapting LLMs to practical use cases. To do that, we select the question answering (QA) task of insurance as a case study due to its challenge of reasoning. Based on the task we design a new model relied on LLMs which are empowered by additional knowledge extracted from insurance policy rulebooks and DBpedia. The additional knowledge helps LLMs to understand new concepts of insurance for domain adaptation. Preliminary results on two QA datasets show that knowledge enhancement significantly improves the reasoning ability of GPT-3.5 (55.80% and 57.83% in terms of accuracy). The analysis also indicates that existing public knowledge bases, e.g., DBPedia is beneficial for knowledge enhancement. Our findings reveal that the inherent complexity of business scenarios often necessitates the incorporation of domain-specific knowledge and external resources for effective problem-solving.


Modelling customer churn for the retail industry in a deep learning based sequential framework

Equihua, Juan Pablo, Nordmark, Henrik, Ali, Maged, Lausen, Berthold

arXiv.org Artificial Intelligence

As retailers around the world increase efforts in developing targeted marketing campaigns for different audiences, predicting accurately which customers are most likely to churn ahead of time is crucial for marketing teams in order to increase business profits. This work presents a deep survival framework to predict which customers are at risk of stopping to purchase with retail companies in non-contractual settings. By leveraging the survival model parameters to be learnt by recurrent neural networks, we are able to obtain individual level survival models for purchasing behaviour based only on individual customer behaviour and avoid time-consuming feature engineering processes usually done when training machine learning models.